Release 10.1A: OpenEdge Development:
Internationalizing Applications
Guidelines for using Unicode
When you use Unicode in OpenEdge applications, the following restrictions, cautions, and suggestions apply:
- With the OpenEdge UTF-8 BASIC collation, composed and decomposed characters are treated as different characters. With the International Components for Unicode (ICU) collations, composed and decomposed characters are treated as the same character for comparisons and indexes.
Note: You can specify a Progress collation or an ICU collation for sorting data using either the Collation Table (- The OpenEdge UTF-8 BASIC collation provides for sorting Unicode data in binary order. Alternatively, the ICU collations provide for sorting Unicode data based on the language-specific requirements for a locale.
-cpcoll) startup parameter, or theCOLLATEoption on theFORstatement, theOPENQUERYstatement, and thePRESELECTphrase. For more information on the-cpcollstartup parameter, see OpenEdge Deployment: Startup Command and Parameter Reference . For more information on the 4GL elements, see OpenEdge Development: Progress 4GL Reference .For information about using ICU collations as database collations, see Chapter 6, " Using Databases."
Note: When sorting Unicode data with an ICU collation, you do not need to normalize the data.- Before sorting Unicode data with the UTF-8 BASIC collation, normalize the data using the 4GL
NORMALIZEfunction. Normalizing the data converts the data into a standardized form that allows for more accurate and consistent sorting and indexing. This is important when working with characters or sequences of characters that have multiple representations (for example, base characters and combining characters) because it ensures that equivalent strings have a unique binary representation. For more information on the 4GLNORMALIZEfunction, see the OpenEdge Development: Progress 4GL Reference .- When UTF-8 data contains decomposed characters, you cannot convert it to a single-byte code page. You must first compose the data using the 4GL
NORMALIZEfunction. When you convert data from a single-byte code page to Unicode, the result is always composed data.- OpenEdge supports code-page conversion to and from UTF-8 the same way it supports code-page conversion to and from other code pages. For more information on code-page conversion, see Chapter 2, "Understanding Code Pages," and Chapter 3, " Understanding Character Processing Tables."
- When an existing database is converted to UTF-8, the amount of storage required by each non-ASCII character increases. Roughly, each non-ASCII Latin-alphabet character converted to UTF-8 tends to require two bytes, while each double-byte Chinese, Japanese, or Korean character converted to UTF-8 tends to require three bytes.
- To display and print Unicode data, consider using a Unicode font. They are available commercially.
|
Copyright © 2005 Progress Software Corporation www.progress.com Voice: (781) 280-4000 Fax: (781) 280-4095 |